A Parallel Implementation of the SOM Algorithm for Visualizing Textual Documents in a 2D Plane

نویسندگان

  • Gustavo Arroyave
  • Oscar Ortega Lobo
  • Andrés Marín
چکیده

With the increase of computer usage, the number of digital documents is reaching values that make unviable conducting the tasks of text organization by humans. There is a demand for text organization tools that can operate with little human intervention and that can display the results of the organization in the most commonly used visual interface: the two-dimensional (2D) plane. One of the techniques used for automatic text organization is the Self Organized Map (SOM), a class of artificial neural network introduced by kohonen in 1982. With SOM, documents can be automatically arranged into groups associated with areas of a 2D plane. Most of SOM implementations are secuential. That is, implementations in which a processor carries out one instruction at once. SOM secuential implementations work fine in domains where the number of objects to organize is low. In large-scale domains like text organization, where is not rare to deal with thousands of documents and millions of words, the application of secuential SOM can become unfeasible in terms of computational time. In an attempt for improving the performance of SOM in large-scale domains, parallel implementation of SOM have been devised. Such implementations take profit of the inherent parallel architecture of SOM. Provided that in the parallel implementation of SOM there are several processors, each one carrying out one instruction at once, there is a reduction of the time required for organizing the objects of the domain. In this paper, a parallel implementation of SOM is achieved by using a Beowulf Cluster of personal computers each with low processing capacity. The speed-up gained by the cluster parallel implementation was measured. This was achieved by measuring the execution time of both a secuential implementation and a parallel implementation. The evaluation was conducted on a constant text corpus. The cluster implementation obtained will help make viable ongoing research on text organization at University of Antioquia where there is no funding for investing in costly parallel processors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel implementation of underwater acoustic wave propagation using beamtracing method on graphical processing unit

The mathematical modeling of the acoustic wave propagation in seawater is the basis for realizing goals such as, underwater communication, seabed mapping, advanced fishing, oil and gas exploration, marine meteorology, positioning and explore the unknown targets within the water. However, due to the existence of various physical phenomena in the water environment and the various conditions gover...

متن کامل

High Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation

Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...

متن کامل

Implementation of the direction of arrival estimation algorithms by means of GPU-parallel processing in the Kuda environment (Research Article)

Direction-of-arrival (DOA) estimation of audio signals is critical in different areas, including electronic war, sonar, etc. The beamforming methods like Minimum Variance Distortionless Response (MVDR), Delay-and-Sum (DAS), and subspace-based Multiple Signal Classification (MUSIC) are the most known DOA estimation techniques. The mentioned methods have high computational complexity. Hence using...

متن کامل

High Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation

Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...

متن کامل

Fast Spherical Self Organizing Map - Use of Indexed Geodesic Data Structure -

In order to remove the “border effect”, several spherical Self-Organizing Maps (SOM) based on the geodesic dome have been proposed. However, existing neighborhood searching methods on the geodesic dome are much more time-consuming than searching on the normal rectangular/hexagonal grid. In this paper, we present detailed descriptions of the algorithms used in training the Geodesic SOM (GeoSOM),...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002